DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2023.15

Sorting Finite Automata via Partition Refinement

Authors: Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

Wheeler nondeterministic finite automata (WNFAs) were introduced in (Gagie et al., TCS 2017) as a powerful generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages such as compression and regular expression matching. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete (Gibney and Thankachan, ESA 2019). Recently, however, Alanko et al. (Information and Computation 2021) showed how to side-step this complexity by switching to preorders: letting Q be the set of states and δ the set of transitions, they provided a O(|δ|⋅|Q|²)-time algorithm computing a totally-ordered partition (i.e. equivalence relation) of the WNFA’s states such that (1) equivalent states recognize the same regular language, and (2) the order of (the classes of) non-equivalent states is consistent with any Wheeler order, when one exists. As a result, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders. Further extensions of this line of work (Cotumaccio et al., SODA 2021 and DCC 2022) generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: in general, any NFA admits a partial preorder of its states reflecting the co-lexicographic order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in O(|δ|² + |Q|^{5/2}) time (Cotumaccio, DCC 2022), while on DFAs the same task can be accomplished in O(min(|Q|²log|Q|, |δ|⋅|Q|)) time (Kim et al., CPM 2023). In this paper, we provide much more efficient solutions to the co-lex order computation problem. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem of Paige and Tarjan to work with ordered partitions. More specifically, we provide a O(|δ|log|Q|)-time algorithm computing a co-lex total preorder when the input is a Wheeler NFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. In addition, we present implementations of our algorithms and show that they are very efficient also in practice.

Cite as

Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza. Sorting Finite Automata via Partition Refinement. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.ESA.2023.15,
  author =	{Becker, Ruben and C\'{a}ceres, Manuel and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Olivares, Francisco and Prezza, Nicola},
  title =	{{Sorting Finite Automata via Partition Refinement}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.15},
  URN =		{urn:nbn:de:0030-drops-186684},
  doi =		{10.4230/LIPIcs.ESA.2023.15},
  annote =	{Keywords: Wheeler automata, prefix sorting, pattern matching, graph compression, sorting, partition refinement}
}

@InProceedings{becker_et_al:LIPIcs.ESA.2023.15,
  author =	{Becker, Ruben and C\'{a}ceres, Manuel and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Olivares, Francisco and Prezza, Nicola},
  title =	{{Sorting Finite Automata via Partition Refinement}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.15},
  URN =		{urn:nbn:de:0030-drops-186684},
  doi =		{10.4230/LIPIcs.ESA.2023.15},
  annote =	{Keywords: Wheeler automata, prefix sorting, pattern matching, graph compression, sorting, partition refinement}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.16

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Authors: Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in O(m² + n^{5/2}) time, m being the number of transitions and n the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in O(mn) and O(n² log n) time, respectively, and an algorithm sorting acyclic DFAs in O(m log n) time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the O(n² log n)-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.

Cite as

Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.CPM.2023.16,
  author =	{Kim, Sung-Hwan and Olivares, Francisco and Prezza, Nicola},
  title =	{{Faster Prefix-Sorting Algorithms for Deterministic Finite Automata}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.16},
  URN =		{urn:nbn:de:0030-drops-179707},
  doi =		{10.4230/LIPIcs.CPM.2023.16},
  annote =	{Keywords: String Matching, Deterministic Finite Automata, Graph Indexing, Co-lexicographical Sorting}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2022.61

Simple Order-Isomorphic Matching Index with Expected Compact Space

Authors: Sung-Hwan Kim and Hwan-Gue Cho

Published in: LIPIcs, Volume 248, 33rd International Symposium on Algorithms and Computation (ISAAC 2022)

Abstract

In this paper, we present a novel indexing method for the order-isomorphic pattern matching problem (also known as order-preserving pattern matching, or consecutive permutation matching), in which two equal-length strings are defined to match when X[i] < X[j] iff Y[i] < Y[j] for 0 ≤ i,j < |X|. We observe an interesting relation between the order-isomorphic matching and the insertion process of a binary search tree, based on which we propose a data structure which not only has a concise structure comprised of only two wavelet trees but also provides a surprisingly simple searching algorithm. In the average case analysis, the proposed method requires 𝒪(R(T)) bits, and it is capable of answering a count query in 𝒪(R(P)) time, and reporting an occurrence in 𝒪(lg |T|) time, where T and P are the text and the pattern string, respectively; for a string X, R(X) is the total time taken for the construction of the binary search tree by successively inserting the keys X[|X|-1],⋯,X[0] at the root, and its expected value is 𝒪(|X|lgσ) where σ is the alphabet size. Furthermore, the proposed method can be viewed as a generalization of some other methods including several heuristics and restricted versions described in previous studies in the literature.

Cite as

Sung-Hwan Kim and Hwan-Gue Cho. Simple Order-Isomorphic Matching Index with Expected Compact Space. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 61:1-61:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.ISAAC.2022.61,
  author =	{Kim, Sung-Hwan and Cho, Hwan-Gue},
  title =	{{Simple Order-Isomorphic Matching Index with Expected Compact Space}},
  booktitle =	{33rd International Symposium on Algorithms and Computation (ISAAC 2022)},
  pages =	{61:1--61:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-258-7},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{248},
  editor =	{Bae, Sang Won and Park, Heejin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.61},
  URN =		{urn:nbn:de:0030-drops-173466},
  doi =		{10.4230/LIPIcs.ISAAC.2022.61},
  annote =	{Keywords: Compact Data Structure, String Matching, Order-Preserving Matching, Suffix Array, FM-index, Binary Search Tree}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.18

A Compact Index for Cartesian Tree Matching

Authors: Sung-Hwan Kim and Hwan-Gue Cho

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

Cartesian tree matching is a recently introduced string matching problem in which two strings match if their corresponding Cartesian trees are the same. It is considered appropriate to find patterns regarding their shapes especially in numerical time series data. While many related problems have been addressed, developing a compact index has received relatively less attention. In this paper, we present a 3n+o(n)-bit index that can count the number of occurrences of a Cartesian tree pattern in 𝒪(m) time where n and m are the text and pattern length. To the best of our knowledge, this work is the first 𝒪(n)-bit compact data structure for indexing for this problem.

Cite as

Sung-Hwan Kim and Hwan-Gue Cho. A Compact Index for Cartesian Tree Matching. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.CPM.2021.18,
  author =	{Kim, Sung-Hwan and Cho, Hwan-Gue},
  title =	{{A Compact Index for Cartesian Tree Matching}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.18},
  URN =		{urn:nbn:de:0030-drops-139699},
  doi =		{10.4230/LIPIcs.CPM.2021.18},
  annote =	{Keywords: String Matching, Suffix Array, FM-index, Compact Index, Cartesian Tree Matching}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2020.35

Indexing Isodirectional Pointer Sequences

Authors: Sung-Hwan Kim and Hwan-Gue Cho

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

Abstract

Many sequential and temporal data have dependency relationships among their elements, which can be represented as a sequence of pointers. In this paper, we introduce a new string matching problem with a particular type of strings, which we call isodirectional pointer sequence, in which each entry has a pointer to another entry. The proposed problem is not only a formalization of real-world dependency matching problems, but also a generalization of variants of the string matching problem such as parameterized pattern matching and Cartesian tree matching. We present a 2nlgσ+2n+o(n)-bit index that preprocesses the text T[1:n] so as to count the number of occurrences of pattern P[1:m] in 𝒪(mlgσ) where σ is the number of distinct lengths of pointers in T. Our index is also easily implementable in practice because it consists of wavelet trees and range maximum query index, which are widely used building blocks in many other compact data structures. By compressing the wavelet trees, the index can also be stored into 2nH^*₀(T)+2n+o(n) bits where H^*₀(T) is the 0-th order empirical entropy of the distribution of pointer lengths of T.

Cite as

Sung-Hwan Kim and Hwan-Gue Cho. Indexing Isodirectional Pointer Sequences. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 35:1-35:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.ISAAC.2020.35,
  author =	{Kim, Sung-Hwan and Cho, Hwan-Gue},
  title =	{{Indexing Isodirectional Pointer Sequences}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{35:1--35:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.35},
  URN =		{urn:nbn:de:0030-drops-133797},
  doi =		{10.4230/LIPIcs.ISAAC.2020.35},
  annote =	{Keywords: String Matching, Suffix Array, FM-index, Wavelet Tree, Range Minimum Query, Parameterized String Matching, Cartesian Tree Matching}
}

Search Results

Documents authored by Kim, Sung-Hwan

Sorting Finite Automata via Partition Refinement

Abstract

Cite as

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Abstract

Cite as

Simple Order-Isomorphic Matching Index with Expected Compact Space

Abstract

Cite as

A Compact Index for Cartesian Tree Matching

Abstract

Cite as

Indexing Isodirectional Pointer Sequences

Abstract

Cite as

Thanks for your feedback!

Could not send message